Reward Hierarchical Temporal Memory Model for Memorizing and Computing Reward Prediction Error by Neocortex
نویسندگان
چکیده
In humans and animals, reward prediction error encoded by dopamine systems is thought to be important in the temporal difference learning class of reinforcement learning (RL). With RL algorithms, many brain models have described the function of dopamine and related areas, including the basal ganglia and frontal cortex. In spite of this importance, how the reward prediction error itself is computed is not understood well, including the problem of how the current states are assigned to a memorized states and how the values of the states are memorized. In this paper, we describe a neocortical model for memorizing state space and computing reward prediction error, known as ‘reward hierarchical temporal memory’ (rHTM). In this model, the temporal relationships among events are hierarchically stored. Using this memory, rHTM computes reward prediction errors by associating the memorized sequences to rewards and inhibits the predicted reward. In a simulation, our model behaved similarly to dopaminergic neurons. We suggest that our model can provide a hypothetical framework of interaction between cortex and dopamine neurons. Keywords-component; reward; reinforcement learning; reward prediction error; temporal difference; HTM; rHTM;
منابع مشابه
Generating Adaptive Behaviour within a Memory-Prediction Framework
The Memory-Prediction Framework (MPF) and its Hierarchical-Temporal Memory implementation (HTM) have been widely applied to unsupervised learning problems, for both classification and prediction. To date, there has been no attempt to incorporate MPF/HTM in reinforcement learning or other adaptive systems; that is, to use knowledge embodied within the hierarchy to control a system, or to generat...
متن کاملTrial-by-Trial Modulation of Associative Memory Formation by Reward Prediction Error and Reward Anticipation as Revealed by a Biologically Plausible Computational Model
Anticipation and delivery of rewards improves memory formation, but little effort has been made to disentangle their respective contributions to memory enhancement. Moreover, it has been suggested that the effects of reward on memory are mediated by dopaminergic influences on hippocampal plasticity. Yet, evidence linking memory improvements to actual reward computations reflected in the activit...
متن کاملReward prediction error signals by reticular formation neurons.
As a key part of the brain's reward system, midbrain dopamine neurons are thought to generate signals that reflect errors in the prediction of reward. However, recent evidence suggests that "upstream" brain areas may make important contributions to the generation of prediction error signals. To address this issue, we recorded neural activity in midbrain reticular formation (MRNm) while rats per...
متن کاملUpdating dopamine reward signals
Recent work has advanced our knowledge of phasic dopamine reward prediction error signals. The error signal is bidirectional, reflects well the higher order prediction error described by temporal difference learning models, is compatible with model-free and model-based reinforcement learning, reports the subjective rather than physical reward value during temporal discounting and reflects subje...
متن کاملEarly and late consolidation and reconsolidation of memory in the prelimbic cortex
Rats can learn to forage among olfactory cues to associate one with reward in only 3 massed trials. The learning is achieved in less than 10 min and results in a memory trace lasting at least 1wk week. To study the neuro-anatomical circuits involved in the memory formation we used immunoreactivity to the immediate early gene c-fos as a marker for neuronal activity induced by the learning. The p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012